home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Mac100% 1998 November
/
MAC100-1998-11.ISO.7z
/
MAC100-1998-11.ISO
/
オンラインソフト定点観測
/
ユーティリティ
/
Mops 3.2.sea
/
Mops 3.2
/
Mops source
/
PPC source
/
cg1
< prev
next >
Wrap
Text File
|
1998-06-22
|
21KB
|
732 lines
marker m__cg1
¥ =========================================
¥ POWERPC CODE GENERATOR
¥ =========================================
true constant assertions?
¥ make it false when everything really works
PPC?
[IF]
false constant debug?
[ELSE]
false constant debug?
[THEN]
false constant 64bit?
¥ =============== OPTIMIZATION FLAGS ===============
(* it's useful to be able to turn these off, to find those
recalcitrant bugs - and in the case of range checking,
to be able to turn it off for speed-critical tested
code. Or for floating multiply-add, someone might want to
turn it off to restore strict IEEE FP semantics.
NOTE: the default for all these boolean values is TRUE.
*)
true value hoist? ¥ can we move ops out of loops if
¥ they're invariant in the loop?
true value hoist_fetches? ¥ can we move fetches to as early
¥ as possible?
true value allow_match? ¥ true if we can eliminate an op if
¥ we find the result already in
¥ a register. Basically the same
¥ as CSE (common subexpression
¥ elimination).
true value move_by_recompiling? ¥ can we try to recompile an earlier
¥ op to avoid a reg move?
true value optimize_leaf_calls? ¥ do we use the fast calling sequence
¥ for leaf definitions?
true value optimize_branches? ¥ do we simplify branches over other
¥ branches, branches to the next
¥ instruction, etc.?
true value cascade? ¥ do we combine ops into single
¥ instructions where we can?
true value range_check? ¥ do we range-check array accesses?
true value multiply-add? ¥ do we cascade a floating multiply
¥ followed by an add into a
¥ floating multiply-add instruction
¥ (or one of its variants)?
¥ =============== FORWARD DEFNS ================
forward .G
forward .F
forward .C
forward .GS
forward .CS
forward .AL
forward .FAL
forward .FR
forward ZS
¥ =============== OTHER GLOBALS ==============
0 value backstop_CDP
0 value fetch_backstop
0 value basic_block_start
0 value loop_start
0 value max_called_#PL
0 value max_called_#FPL
false value equalizing? ¥ we don't actually use this one currently
false value eq_block_recompiling_move?
bytestring eq_ranges
bytestring const_data
bytestring sv_const_data
(*
Notes on PPC code generation:
The general way to generate optimized code for an architecture with n
registers is to analyse each basic block and generate a directed acyclic
graph (DAG) whose nodes represent each value generated and whose edges
represent uses of those values. Note it isn't a tree, since more than
one edge can go into a node. Then optimization can be done, and common
subexpressions (i.e. nodes) combined.
Then, using graph coloring or whatever, the nodes are assigned to
registers. This assumes there are less regs than nodes, so that we
have to find all dependencies, and re-use a reg when its old value
isn't needed any more.
Here we can simplify things a little, and manage everything in one pass.
On the PPC we have so many regs that for normal shortish basic blocks,
we will generally be able to simply assign a different reg for each node.
On the few occasions when we don't have a free reg, we can select one
which has a value with no outstanding references - there's a slight chance
we could have used this value in a subsequent op, but this is pretty well
negligible. We can optimize on the fly: each time we get a new value,
we can search the reg set for a match so we can use a combined node.
This code is invoked as follows:
Whenever Handlers is called to generate 68K code, if PPC? is true,
the Handlers selector and opcode is pushed and PPCvec is executed.
We will set PPCvec to point to PPC_compile which will use the selector
to dispatch to the right routine.
Notes on memory architecture:
On the PPC, code and data is normally kept separate, with the code
being read-only. The PEF format defines separate code and data sections,
and at launch time these are placed in pointer-based blocks in memory.
We ought to conform to this convention for installed apps, since it will
give the best performance.
In the development environment, our dictionary should be in a read-write
block for obvious reasons. We'll define a separate data area, which
will become the data section of an installed app. These two areas
can be in handles, which will allow us to resize them on the fly if
necessary.
At installation time, we'll generate a PEF in which the dictionary is
added to the existing code section, and the data area is added to the
data section. Our initial nucleus will probably be generated exactly
this way, either from the 68k or PPC version. Thus our initial
nucleus is simply an application, which transforms itself into the
Mops development environment by creating a handle for the new dictionary
and data areas. Now, we could allow the new dictionary area to link
itself to the old one by some cleverness, but since the dic would
only be getting split at one place I don't think it would be worth
the complexity. Better to just BlockMove the old dic area to the start
of the new when the dev environment is being set up, and likewise for
the data area.
Colon uses a new header format incorporating two flag bytes in addition to
what we use on the 68k, and also has to observe 4-byte alignment for the
code. For this reason we generate a full PPC-style header (including 4-byte
alignment) for colon definitions. When we compile a call to a colon defn we
have to do PPC-style alignment even on the 68k. For other types of definition
we don't have to bother, and the code should work on both platforms (since
on the PPC FIND will look after the extra alignment).
The 2 flag bytes are organized as 4 nybbles:
nybble 0: bit 0: 1 if it's a leaf defn
bit 1: 1 if it alters the count register
bit 2: spare
bit 3: 1 if it alters any FP reg
nybble 1: number of results in registers on exit
nybble 2: number of named parameters
nybble 3: number of named parms + locals
(doing it this way is a bit more convenient, and the
max number of parms+locals is only 11, so we have
enough bits)
If the definition does any floating operations or anything that alters
an FP reg, we need more flags, so we add an extra 32 bits. For
alignment, we have to take at least 32, so we might as well make the
most of it.
byte 0:
byte 1:
bytes 2 and 3 are 4 nybbles:
nybble 0:
nybble 1: number of floating results in registers on exit
nybble 2: number of named floating parameters
nybble 3: number of named floating parms + locals
============= REGISTER DEFINITIONS: ================
Notes on reg assignment:
We need 3 regs for scratch in boilerplate code sequences. We'll use
r0, which is a bit unusual anyway, and r11 and r12 (see below).
r1 is the stack pointer, r2 is RTOC. We can't monkey with these.
r11 and r12 are used in the calling sequence for external calls. Apple
says they can be used as scratch at all other times, so we'll use them
in boilerplate sequences. PowerMacForth does this too, and in the assembler
they're aliased as rX and rY.
For external calls, r3-r10 are used for parameters, and r3 for a returned
result. They won't be saved over the calls. Of course for internal Mops
calls we can do what we like. We can use these regs for general operands,
and on an external call normalize the stack so that the required number of
cells are stored in r3 on. At that stage we won't have any other cached
stack cells, so we don't need the regs preserved anyway.
This scenario gives us 8 regs for general operands, i.e. cached stack
cells (r3-r10), which should be enough. If it turns out not to be enough
we could grab a couple of the regs we've allocated for locals (see below).
r13-31 are "non-volatile" - they're saved over external calls. For
internal Mops calls we just need to save the locals, since other regs
like the base address regs don't get changed.
Now for the special regs we need. These all need to be saved over external
calls, and so are in the non-volatile block.
For addressing the dictionary, a difference to the 68k version is that
we need to keep code and data separate (see above). In the development
environment these will be in handle-based blocks, and in an installed
app they'll be defined in the PEF which will make them end up in pointer-
based blocks. Anyway, as long as we handle the addressability questions
properly, it shouldn't matter where they are.
Code references will be for branches, constants and other constant data
like literal strings and class info.
Constants can always be handled via literal instructions. Even if it
needs more than 16 bits it can be generated in a reg with 2 instructions
which will be faster than a memory reference.
Branches will always be self-relative, and they have enough displacement
bits to get us anywhere.
For other constant data, however, it's extremely handy to have a base
reg available, even if these references aren't performance-critical.
Thus, if we still have modules (which is still up for grabs), we'll
need 4 base regs - 2 to address code and 2 for data. Note that on
entry, r2 (RTOC) always points to the start of the data area as defined
in the PEF. But we can't use RTOC as a regular base reg, since in the
dev environment our data area will be off in a separate handle.
Other regs we need are RP (return stack pointer), the loop variable I,
and the base address of the current object. Now we may as well use one of
our "local" registers for I, since it will be very rare for us to need
all of them. This will mean one less local in definitions that use I,
but that's not a problem.
It seems we should go for a separate FP stack, since the Scientific
Library is now using this. This will also give better code on the PPC, since
when we bring stack cells back from mem to regs, we'll always know which regs
to move them to. The floating stack pointer probably should be in a
register.
So in all we'll need 7 special regs out of the non-volatile block.
This leaves r19-r31 for locals, which means that if we limit the number
of locals to 13, we can keep them all in registers. This looks reasonable.
**** Side note: we won't use the lmw and stmw instructions, since Keith D has
warned me that they'll go away in future! This means that we can use
their values as pseudo-opcodes, since we know that Mops will never generate
them with their proper meaning. The values in question are primary opcodes
46 and 47, which means $B8xxxxxx to $BFxxxxxx. In particular, we'll
use $BE00 as a handler field for PPC colon definitions. (Handler fields will
be 4-byte aligned, and so will appear in the same place as instruction
opcodes.) This will make life easier for the disassembler.
Some notes on register handling across internal Mops calls:
Saving and restoring local regs can be a bit long-winded, so to save
space we should normally do what we do on the 68k - that is, at the
start of each defn, save whatever regs we need for locals, and restore
them at the end. For EXIT, instead of doing everything inline as we
did on the 68k, we'll do a branch to the semicolon. This is almost as
fast (esp. as it's an unconditional branch), and saves space.
We'll probably make an exception for leaf procs, since these get executed
so frequently. What I'm currently planning, in the case where the leaf
proc has named parms/locals, is to do the houskeeping in the calling
routine instead of in the called leaf proc. This will give me the
possibility of generating the parms straight in the required regs.
Also I might be able to do the saving and restoring of the needed regs
at the beginning and end of the calling routine (depending on what
parms/locals that routine might need. This would get these housekeeping
operations out of any inner loops.
This alternative calling convention should certainly be faster, but will
take a lot more space, so I won't do it all the time.
========================================
*)
0 constant CR0
4096 constant sys_SP_framesize
8 constant FPcell
(* We don't move the stack pointers every time we push and pop something
- rather, we keep track of the accumulated offset here and only adjust
when we have to.
*)
0 value STK_OFFSET
0 value FSTK_OFFSET
¥ ================= OD FIELD VALUES ==================
¥ Here we define various values for OD fields. We generally use the same
¥ names as for the 68k code generator, although the values aren't all
¥ the same (which doesn't matter anyway).
¥ Mode values:
enum { mdGPR mdBD mdAbs mdLit mdPC mdFPn }
PPC? [IF] hexx [ELSE] hex [THEN]
¥ Flag byte bits (opFlags field):
0 constant flExt ¥ Sign extend
1 constant fbExt
1 constant flFP ¥ Floating operation
2 constant fbFP
2 constant flLit ¥ Floating Literal
4 constant fbLit
3 constant flFCR ¥ FPU constant ROM reference
8 constant fbFCR
¥ 10 constant fbNoRecompile?
¥ Operation type byte:
¥ 0 means empty, 1 to 7 mean not empty but unknown for some reason.
¥ So far we've only defined 1 and 2.
1 constant otUnknown ¥ isn't empty, but we don't know anything about
¥ the contents
2 constant otUnkStored ¥ ditto, but we've stored it and so might be
¥ able to optimize a fetch from the same
¥ location, even tho we don't know its type
3 constant otUnkPulled ¥ ditto, pulled from memory part of stack
7 constant otUnknownCodes ¥ all codes less than or equal to this are
¥ some kind of empty/unknown
8 constant otMove ¥ reg move
¥ the following are also on the 68k, and in target compilation they
¥ dispatch to us from the 68k interpreter using these codes, so we
¥ can't change them.
12 constant otMUL
13 constant otDIV
14 constant otUDIV
¥ the following are PPC only:
10 constant otMULH ¥ multiply high
11 constant otUMULH ¥ multiply high
16 constant otAddc
¥ 17 constant otAddic
17 constant otAdde
18 constant otAddze
19 constant otAddme
1A constant otSubfc
¥ 1C constant otSubfic
1B constant otSubfe
1C constant otSubfze
1D constant otSubfme
¥ these following are also 68k codes, and can't change
21 constant otADD
22 constant otSUB
23 constant otAND
24 constant otOR
25 constant otXOR
26 constant otCMP
27 constant otUCMP ¥ unsigned compare - PPC only
28 constant otNEG
29 constant otNOT
2A constant otShift
2B constant otShift&mask ¥ PPC only
2C constant otTrap ¥ ditto
30 constant otPMend ¥ End of integer ops
¥ 3F constant otFPcmp ¥ Floating-point comparison. A special case.
40 constant otFPstart ¥ Start of regular floating-point ops. Note
¥ these are NOT in the same order as the
¥ integer ops.
40 constant otFMOVE
41 constant otFADD
42 constant otFMUL
43 constant otFMADD ¥ multiply-add - PPC only
48 constant otFPnoncom ¥ The following FP ops are non-commutative
48 constant otFSUB
49 constant otFDIV
4F constant otFPcmp ¥ FP comparisons
50 constant otFPstore
52 constant otFPfetch
54 constant otFPmon ¥ The following FP ops are monadic
(* we don't bother including these in the dictionary, but these
are the values:
54 constant otFABS
55 constant otFNEG
56 constant otFSIN
57 constant otFCOS
58 constant otFTAN
59 constant otFATAN
5A constant otFSQRT
*)
5F constant otFPend ¥ End of FP ops
60 constant otStore ¥ Store
80 constant otDeferredStore
62 constant otFetch ¥ Direct fetch
otFetch
constant otAt ¥ Indirect fetch - ends up being treated
¥ the same
62 constant otDUP ¥ Stack shuffling
63 constant ot2DUP
64 constant otDROP
65 constant ot2DROP
66 constant otSWAP
67 constant otOVER
(*
68 constant otNIP
69 constant otTUCK
6A constant otROT
6B constant otDOWN
6C constant ot2SWAP
*)
¥ For FP stack shuffling, we'll use the corresponding
¥ opcodes in the range 72 - 7B, but not bother defining
¥ constants and cluttering the dic too much.
¥ Subtype byte values:
¥ For comparisons, the top 4 bits of the byte give the condition register field
¥ bit which we need to branch on for this condition: LT = 0, GT = 1, EQ = 2.
¥ The low bit (bit 7) is 1 if the condition we want has a field bit value of 1.
¥ Bit 6 is 1 for unsigned, and bit 5 is 1 for comparisons with zero.
20 constant cmpNE
21 constant cmpEQ
00 constant cmpGE
01 constant cmpLT
10 constant cmpLE
11 constant cmpGT
02 constant cmpHS
03 constant cmpLO
12 constant cmpLS
13 constant cmpHI
24 constant cmpZNE
25 constant cmpZEQ
04 constant cmpZGE
05 constant cmpZLT
14 constant cmpZLE
15 constant cmpZGT
¥ here's a table to map our 68k comparison codes to the above PPC ones:
PPC? [IF]
createx comparison_codes
0 cx, ¥ 0
0 cx, ¥ 1
cmpHI cx, ¥ 2
cmpLS cx, ¥ 3
cmpHS cx, ¥ 4
cmpLO cx, ¥ 5
cmpNE cx, ¥ 6
cmpEQ cx, ¥ 7
0 cx, ¥ 8
0 cx, ¥ 9
0 cx, ¥ A
0 cx, ¥ B
cmpGE cx, ¥ C
cmpLT cx, ¥ D
cmpGT cx, ¥ E
cmpLE cx, ¥ F
decimalx
[ELSE]
create comparison_codes
0 c, ¥ 0
0 c, ¥ 1
cmpHI c, ¥ 2
cmpLS c, ¥ 3
cmpHS c, ¥ 4
cmpLO c, ¥ 5
cmpNE c, ¥ 6
cmpEQ c, ¥ 7
0 c, ¥ 8
0 c, ¥ 9
0 c, ¥ A
0 c, ¥ B
cmpGE c, ¥ C
cmpLT c, ¥ D
cmpGT c, ¥ E
cmpLE c, ¥ F
decimal
[THEN]
0 value OPERATION
0 value SUBOPERATION
¥ Some types of instruction need special treatment - e.g. AND etc. use the
¥ rA field for the destination. So we define some types:
enum { noType loadStoreType arithType logicalType cmpType branchType shiftType }
¥ ================= UTILITY WORDS ==================
: MONADIC? ¥ ( opcode -- opcode b )
otNeg otNot within? IF true EXIT THEN
otFPmon otFPend within?
;
: MEM_REFERENCING? ¥ ( opcode -- b )
SELECT[ otFetch ],
[ otStore ],
[ otFPFetch ],
[ otFPStore ]=> true
DEFAULT=> drop false
]SELECT
;
PPC?
[IF]
: dasm ; ¥ can't disassemble in native mode yet!
: z ;
[ELSE]
forward dasm ¥ disassembles what we've done so far
forward dcurr ¥ disassembles the current defn (even if not finished)
forward Z ¥ ends ppc compilation and disassembles.
[THEN]
¥ GetImmediateOp does the same when we're going to execute the operation
¥ now, returning the xt of the word to execute.
: GETIMMEDIATEOP { opType subtype -- xt }
opType
SELECT[ otAdd ]=> ['] +
[ otSub ]=> ['] -
[ otAND ]=> ['] and
[ otOR ]=> ['] or
[ otXOR ]=> ['] xor
[ otMUL ]=> ['] *
[ otDIV ]=> ['] /
[ otNEG ]=> ['] negate
[ otNOT ]=> ['] not
[ otCMP ]=>
subtype
CASE[ cmpNE ]=> ['] <>
[ cmpEQ ]=> ['] =
[ cmpGE ]=> ['] >=
[ cmpLT ]=> ['] <
[ cmpLE ]=> ['] <=
[ cmpGT ]=> ['] >
[ cmpZNE ]=> ['] 0<>
[ cmpZEQ ]=> ['] 0=
[ cmpZGE ]=> ['] 0>=
[ cmpZLT ]=> ['] 0<
[ cmpZLE ]=> ['] 0<=
[ cmpZGT ]=> ['] 0>
DEFAULT=> cr .h ." undef otCMP subtype in getImmediateOp" 1 die
]CASE
[ otUCMP ]=>
subtype
CASE[ cmpHS ]=> ['] u>=
[ cmpLO ]=> ['] u<
[ cmpLS ]=> ['] u<=
[ cmpHI ]=> ['] u>
DEFAULT=> cr .h ." undef otUCMP subtype in getImmediateOp" 1 die
]CASE
[ otShift ]=>
subtype
CASE[ 0 ]=> ['] <<
[ 1 ]=> ['] >>
DEFAULT=> cr .h ." undef otShift subtype in getImmediateOp" 1 die
]CASE
DEFAULT=> cr .h ." undef op passed to getImmediateOp" 1 die
]SELECT
;
PPC? not
[IF] ¥ we put this in pnuc3 in target mode
¥ 16bits? ( n signed? -- n b )
¥ returns true if n will fit in 16 bits (signed or unsigned as requested).
: 16BITS? ¥ ( n signed? -- n b )
IF -32768 32767 within?
ELSE
dup 16 >> 0=
THEN
;
[THEN]
: SIGNED? ¥ ( operation - b )
¥ Returns true if this is a signed op - assumed to be the
¥ default. We just return false for the specific unsigned
¥ ops, and true for everything else.
SELECT[ otUDIV ],
[ otAND ],
[ otOR ],
[ otXOR ],
[ otUCMP ]=> false
DEFAULT=> drop true
]SELECT
;
: REVERSE_COMPARISON ¥ called if we swap operands of a compare. Adjust
¥ subOperation appropriately.
subOperation $ 24 and ?EXIT ¥ out if monadic or EQ or NE
$ 10 xor> subOperation ;
¥ This must come before we redefine @ABS below!
PPC? not
[IF]
from pasmMod import{ :PPC_code ;PPC_code
disasm disasm_word disasm_xt
disasm_rng disasm_cnt disasm_one
set_disasm_call_range }
compile: pasmMod
¥ While still in 68k-land, we need a PPC-style reloc! and @abs. The proper
¥ PPC versions will be compiled in pnuc3 and setup respectively. Note,
¥ any changes must of course be made in both places.
: RELOC! { theAddr dest -- }
theAddr addr>S&D
$ ffffff and swap 24 << or
dest !
;
: reloc, DP reloc! 4 ++> DP ;
: relocCode, CDP reloc! 4 ++> CDP ;
: displCode, CDP displ! 4 ++> CDP ;
¥ In the target compilation, we only have 4 segments.
: @ABS { addr ¥ relocAddr seg# displ -- absAddr }
addr @ -> relocAddr
relocAddr $ ffffff and -> displ
relocAddr 24 >> -> seg#
seg#
SELECT[ 8 ]=> code_start
[ 9 ]=> data_start
[ 10 ]=> seg_code_start
[ 11 ]=> seg_data_start
DEFAULT=> ." not a reloc addr" 1 die
]SELECT
displ +
;
: @abs6 @abs ; ¥ a variant name we can call later when @abs
¥ has been redefined in the PPC image
[THEN]